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Abstract 

Spider silk fibers have impressive mechanical properties and are primarily composed of highly repetitive structural proteins 
(termed spidroins) encoded by a single gene family. Most characterized spidroin genes are incompletely known because of their 
extreme size (typically >9kb) and repetitiveness, limiting understanding of the evolutionary processes that gave rise to their 
unusual gene architectures. The only complete spidroin genes characterized thus far form the dragline in the Western black 
widow, Latrodectus Hesperus. Here, we describe the first complete gene sequence encoding the aciniform spidroin AcSpl, the 
primary component of spider prey-wrapping fibers. L hesperus AcSpl contains a single enormous (~19kb) exon. The AcSpl 
repeat sequence is exceptionally conserved between two widow species (~94% identity) and between widows and distantly 
related orb-weavers (~30% identity), consistent with a history of strong purifying selection on its amino acid sequence. 
Furthermore, the 16 repeats (each 371-375 amino acids long) found in black widow AcSpl are, on average, >99% identical 
at the nucleotide level. A combination of stabilizing selection on amino acid sequence, selection on silent sites, and intragenic 
recombination likely explains the extreme homogenization of AcSpl repeats. In addition, phylogenetic analyses of spidroin 
paralogs support a gene duplication event occurring concomitantly with specialization of the aciniform glands and the tubuli- 
form glands, which synthesize egg-case silk. With repeats that are dramatically different in length and amino acid composition 
from dragline spidroins, our L hesperus AcSpl expands the knowledge base for developing silk-based biomimetic technologies. 

Key words: aciniform silk, concerted evolution, full-length gene, Latrodectus hesperus, spidroin, Western black widow. 



Introduction 

Spiders (Araneae) rely on silk throughout their lifetime and 
are unparalleled in the diversity of silks they can synthesize. 
A single orb-web weaving spider (Orbiculariae, fig. 1) pos- 
sesses seven types of specialized abdominal silk glands. Each 
gland type produces a different silk fiber or glue that has a 
unique function (Foelix 2010). For example, major ampullate 
glands produce the dragline, tubuliform glands synthesize 
large diameter egg-case silk fibers, capture spiral threads 
of the orb-web originate in the flagelliform glands, and 
prey-wrapping silk is synthesized in aciniform glands. 
Diversity of silk function is paralleled by diversity of silk mech- 
anical properties (Gosline et al. 1999; Blackledge and Hayashi 
2006). Dragline silk approaches the tensile strength of steel 
and capture-spiral fibers can stretch more than one and a 
half times their original length, an order of magnitude greater 
than dragline fibers (Denny 1976; Gosline et al. 1999; 
Blackledge and Hayashi 2006). In garden orb-weavers 
(Argiope argentata and A. trifasciata), prey-wrapping silk com- 
bines high extensibility with tensile strength to form a fiber 
that is twice as tough as the dragline of those species (Hayashi 
et al. 2004; Blackledge and Hayashi 2006) and is one of the 
toughest silks measured thus far for any species (Agnarrson 
et al. 2010). 



Differences among silks in mechanical properties derive 
in large part from the differences in protein composition of 
each fiber type (Hayashi and Lewis 1998, 2001; Gosline et al. 
1999; Hayashi et al. 1999). Spider silk fibers are primarily com- 
posed of one or more unique structural proteins termed 
spidroins (a contraction of "spider fibroin"), which are mem- 
bers of a single gene family (Guerette et al. 1996; Gatesy et al. 
2001). In orb-weaving spiders, different gland types secrete 
different spidroins. For instance, major ampullate glands 
express the dragline spidroins, MaSpl and MaSp2 (Xu and 
Lewis 1990; Hinman and Lewis 1992; Sponner et al. 2005) and 
aciniform glands synthesize AcSpl (Hayashi et al. 2004). Thus, 
spider silks could be used in a plethora of biomimetic appli- 
cations that capitalize on transgenic technology (Sponner 
2007). Furthermore, spider silks represent a spectacular 
example of functional diversification via gene duplication 
followed by sequence and expression divergence. 

Efforts to understand the molecular evolution of spidroins 
and create recombinant spider silks have been hampered by 
the difficulty in characterizing complete spidroin-encoding 
sequences. Thus far, only two complete spidroin genes have 
been described: MaSpl and MaSp2 of the Western black 
widow, Latrodectus hesperus (Ayoub, Garb, Tinghitella, et al. 
2007). Spidroins are extremely large proteins (200-350 kDa; 
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Fic. 1. Relationships among spider species used in this study based on hypotheses of Scharff and Coddington (1997), Coddington (2005), Ayoub, Garb, 
Hedin, et al. (2007), Kuntner et al. (2008), and Elices et al. (2009). Species abbreviations precede the species names. Major taxonomic groups are 
bracketed. 



e.g., Hayashi et al. 1999; Sponner et al. 2005; Ayoub, Garb, 
Tinghitella, et al. 2007; Vasanthavada et al. 2007) made up 
almost entirely of repeated blocks of amino acids (aa) flanked 
by short (<150 aa) nonrepetitive amino (N)- and carboxy 
(C)-terminal domains. The N- and C-terminal domains of 
spidroins are conserved in length and share amino acid sig- 
natures across gene family members (e.g., Guerette et al. 1996; 
Gatesy et al. 2001, Motriuk-Smith et al. 2005; Garb et al. 2010). 
Although these terminal domains are involved in functions 
general to spidroins (e.g., fiber assembly, Ittah et al. 2006; 
Askarieh et al. 2010; Hagn et al. 2010; Eisoldt et al. 2012), 
the repetitive regions are thought to be responsible for the 
variation in mechanical properties of different fiber types 
(Gosline et al. 1999; Hayashi et al. 1999). Secondary structures, 
such as beta-pleated sheets or beta-turns, have been pre- 
dicted for a few simple amino acid sequence motifs that 
are common to a subset of spidroins (Hayashi et al. 1999; 
Holland, Creager, et al. 2008; Holland, Jenkins, et al. 2008; 
Jenkins, Creager, Butler, et al. 2010; Jenkins, Creager, Lewis, 
et al. 2010). Some spidroins, such as MaSpl, MaSp2, and 
Flag (capture-spiral spidroin), string together a subset of 
these simple motifs to form a unit called an ensemble 
repeat, which can then be repeated in the tens to hundreds 
of times within one spidroin molecule (e.g., Gatesy et al. 2001; 
Ayoub, Garb, Tinghitella, et al. 2007). Other spidroins, how- 
ever, such as AcSpl, have longer and more complex repeat 
units with only a few of these simple motifs (Gatesy et al. 
2001; Hayashi et al. 2004; Garb and Hayashi 2005; Garb et al. 
2007; Starrett et al. 2012). 

Although much work has focused on spidroins with simple 
repeats, few studies have investigated spidroins with long, 
complex repeats like AcSpl (Rising et al. 2011). AcSpl of 
A. trifasciata possesses over fourteen 200 aa long repeats 



that are virtually identical to each other (Hayashi et al. 
2004). Although all spidroins show high levels of identity 
among ensemble repeats (e.g., Gatesy et al. 2001; Garb and 
Hayashi 2005; Ayoub, Garb, Tinghitella, et al. 2007; Garb et al. 
2007; Starrett et al. 2012), A. trifasciata AcSpl is exceptional 
with an average of 99.9% pairwise identity among repeats 
at the nucleotide level (Hayashi et al. 2004). Although partial 
AcSpl cDNAs have been described from the cob-web weav- 
ing Western black widow (Vasanthavada et al. 2007), and the 
feather-legged orb-weaver Uloborus diversus (Garb et al. 
2006), these cDNAs were too short to evaluate the generality 
of extreme homogeneity among repeats. Intragenic concerted 
evolution has been cited as the process that homogenizes 
repeats of various spidroins, but it is unclear why A. trifasciata 
AcSpl is dramatically more homogenized than other spidroin 
paralogs. Characterization of a complete AcSpl gene from 
another species would address if extreme homogeneity of 
intragenic repeats is a general property of AcSpl and provide 
insight about the molecular evolution of the complex repeat 
unit. 

Aciniform spidroins may prove instrumental in decipher- 
ing the history of spider silk gene duplications. The number of 
functionally specialized silk gland types is positively correlated 
with number of spidroin paralogs (Garb et al. 2007). This 
association is consistent with the hypothesis that gland 
types and spidroins have co-evolved (Hayashi and Lewis 
1998). Virtually all spiders possess spherical to pear-shaped 
glands that are similar in structure to the aciniform or pyri- 
form glands of the superfamily Orbiculariae (fig. 1, Shultz 
1987). If these simple acinous-shaped structures represent 
the ancestral gland type, as proposed by Shultz (1987), we pre- 
dict aciniform spidroins would be recovered in a basal phylo- 
genetic position relative to other spidroins. Shultz (1987) 
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also proposed that tubuliform glands are specialized acini- 
form glands. We thus predict that spidroins expressed in 
aciniform and tubuliform glands will be closely related. 
Testing these predictions requires reconstructing spidroin 
gene trees from the nonrepetitive terminal domains due to 
challenges associated with determining positional homology 
among repetitive sequences (e.g., Gatesy et al. 2001). 
Complete spidroin encoding sequences double the amount 
of phylogenetic information relative to partial N- or 
C-terminal sequences and ensure that the two domains are 
accurately associated (Garb et al. 2010). 

In this study, we report a complete AcSpl gene from the 
Western black widow, L Hesperus, identified from a fully 
sequenced 39 kb region of genomic DNA and partial AcSpl 
gene sequences from the brown widow, L. geometricus 
(Theridiidae). These data were used to address three related 
goals: 1) to test whether extreme homogeneity of intragenic 
sequence repeats is a general property of AcSpl, 2) to evalu- 
ate whether spidroins co-evolved with glandular specializa- 
tion, specifically testing the hypotheses that AcSpl has a basal 
position in spidroin gene trees and groups with tubuliform 
spidroins, and 3) to determine whether the gene structure of 
AcSpl is similar to MaSpl and MaSp2 in lacking introns, 
making it an expedient template for the production of 
recombinant silk proteins. Our results show that black 
widow AcSpl is composed of a single enormous exon 
(18,999 bases of coding sequence). We demonstrate that 
extreme homogeneity of intragenic repeats is a general fea- 
ture of AcSpl, with sequence repeats being exceptionally 
conserved between Latrodectus species and among orbicular- 
ians. This level of conservation and intragenic homogeniza- 
tion may be explained by the combined forces of stabilizing 
selection and intragenic concerted evolution having acted 
on the AcSp7 gene. Finally, our phylogenetic analyses of spi- 
droin paralogs demonstrate novel support for co-evolution 
of gene duplications with glandular specialization by uncover- 
ing a close relationship between AcSpl and tubuliform spi- 
droins. In addition to these contributions to understanding 
the molecular evolution of spidroins, our full-length AcSpl 
provides a complete genetic blueprint for biomimetic appli- 
cations that capitalize on the extreme toughness of aciniform 
silk fibers. 

Results 

We sequenced and assembled 39,269 base pairs (bp) of L 
hesperus genomic DNA (JX978171), including a complete 
open reading frame (ORF) that is 18,999 bp in length and 
predicted to encode a 6,332 aa AcSpl. This is the longest 
coding region described for any spidroin. The most abundant 
amino acids in AcSpl are alanine (15.1%), serine (13.1%), and 
glycine (11.3%) (fig. 2). Despite the prevalence of alanine 
and glycine, the amino acid motifs poly-A, GGX, GPG, and 
poly-GA that dominate the MaSpl and MaSp2 dragline 
spidroins (Ayoub, Garb, Tinghitella, et al. 2007), are absent 
or rare in AcSpl (supplementary fig. S1, Supplementary 
Material online). Furthermore, the amino acid composition 
of AcSpl is much more evenly distributed than in MaSpl 
or MaSp2, for which more than 60% of the protein is made 




AcSpl MaSpl MaSp2 



Fig. 2. Amino acid compositions of complete Latrodectus hesperus 
AcSpl, MaSpl, and MaSp2. Three letter amino acid abbreviations are 
used. 

up of alanine or glycine (fig. 2). As has been observed 
with other spidroin genes, codon usage in AcSpl for alanine, 
glycine, threonine, and proline is skewed toward codons that 
end in adenine or thymine (77.2% of alanine codons, 82.1% of 
glycine codons, 72% of threonine codons, 82.4% of proline 
codons; supplementary table S1, Supplementary Material 
online). 

AcSpl alternates between hydrophobic and hydrophilic 
regions (range of Kyte-Doolittle hydrophilicity = — 3.4 to 
3.3) and on average is slightly hydrophobic (average Kyte- 
Doolittle hydrophilicity = —0.16). The shifts from hydrophilic 
to hydrophobic regions in the N- and C-termini are very 
similar to those seen in L hesperus MaSpl, but the repetitive 
region of AcSpl does not display the consistent shift seen 
in MaSpl between hydrophilic glycine-rich and hydrophobic 
alanine-rich sequences (supplementary fig. S2, Supplementary 
Material online). 

Partial sequencing of four additional genomic clones 
containing AcSpl (JX978172-JX978175) revealed that 
they were very similar to each other and to sequences ampli- 
fied from four individual spiders (JX978176-JX978179). 
Average uncorrected p-distance across the region sequenced 
from all clones and individuals was 1.3% (0.077-2.9%). 
The previously described L hesperus "AcSp7-like" cDNA se- 
quence was only 1.2% different from our completely 
sequenced clone. 
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Homogenization and Conservation of AcSpl 
Our L hesperus AcSpl includes 15 iterations of a 1,125 bp 
repeat and one, slightly shorter repeat variant of 1,113 bp. 
There is also very little sequence variation among repeats 
(supplementary fig. SI, Supplementary Material online). 
At the amino acid level, the average intragenic pairwise dif- 
ference among repeats was just 0.68% for L hesperus 
(range = 0-2.4%). AcSpl intragenic repeats are similarly 
homongenized in three species of Araneidae: 0.62% 
for A. trifasciata (range = 0-1.9%), 2.1% for A. amoena 
(range = 0-3.3%), and 2.4% for Araneus ventricosus 
(range = 0-6.1%). Differences between repeats from closely 
related species were always higher than intragenic differences. 
The lowest pairwise difference between species was for the 
comparison of L hesperus repeats to the L geometricus con- 
sensus repeat (average = 5.9%, range = 5.6-7.0%; JX978182). 
Average pairwise differences for the araneid repeats were 
26.0% between Aran, ventricosus and A. amoena, 23.0% be- 
tween Aran, ventricosus and A. trifasciata, and 18.2% between 
A. amoena and A. trifasciata. 

Homogeneity of Latrodectus AcSpl repeats is maintained 
at the nucleotide level (average pairwise difference = 0.58%, 
range = 0-2.2%). This level of intragenic nucleotide diver- 
gence among repeats is even lower than observed in 
other L. hesperus spidroins, such as TuSpl (average = 2.9%, 
range = 0.18-7.1%) and MaSpl (average = 2.4%, range = 
0.28-6.3%). We tested whether homogeneity of L hesperus 
AcSpl intragenic repeats could result solely from constraints 
on amino acid sequence by calculating the pairwise number 
of synonymous substitutions per synonymous sites (Ks) and 
nonsynonymous substitutions per nonsynonymous sites (Ka) 
between intragenic repeats. The extremely low divergence 
among L. hesperus AcSpl repeats is due to both very few 
nonsynonymous and synonymous substitutions (average 
Ka = 0.0022, average Ks = 0.0081; fig. 3). Both types of substi- 
tutions are five to nine times lower for intragenic AcSpl re- 
peats compared with intragenic MaSpl (average Ka = 0.011, 
average Ks = 0.057) and TuSpl (average Ka = 0.017, average 
Ks = 0.076; fig. 3). 

The AcSpl repeats are more conserved than the adjacent 
nonrepetitive C-terminus in Latrodectus (9.1% difference be- 
tween C-termini vs. 5.9% average pairwise difference between 
repeats of L hesperus and L geometricus) but less conserved in 
the araneids (14.1% difference between C-termini vs. 23.0% 
average pairwise difference between repeats of Aran, ventri- 
cosus and A. trifasciata). At the nucleotide level, AcSpl repeats 
are also highly conserved between L. hesperus and L geome- 
tricus relative to interspecific comparisons of adjacent N- and 
C-termini and TuSpl (fig. 3). We compared selective pressures 
on AcSpl repeats with adjacent N- and C-terminal encoding 
regions and repeats of paralogous spidroins. Nonsynonymous 
substitutions between L hesperus and L geometricus are 
similarly low for AcSpl repeats (Ka = 0.026), TuSpl repeats 
(Ka = 0.033), N-termini of AcSpl (Ka = 0.0095), and 
C-termini of AcSpl (Ka = 0.036) and TuSpl (Ka = 0.066). 
However, synonymous substitutions in AcSp7 repeats 
appear to be severely suppressed (fig. 3). Average interspecific 




Fig. 3. Pairwise values of nonsynonymous substitutions per nonsynon- 
ymous sites (Ka; gray) and synonymous substitutions per synonymous 
sites (Ks; black) for intragenic comparisons among Latrodectus hesperus 
spidroin repeats or intergenic comparisons between L hesperus and 
L geometricus for corresponding regions of spidroins. Values shown 
for repeats are averaged across all pairwise comparisons. 



synonymous substitutions between AcSpl repeats (Ks = 
0.050) is 11 times lower than the adjacent C-termini 
(Ks = 0.58), five times lower than the adjacent N-termini 
(Ks = 0.27), and seven times lower than the repeats of 
TuSpl (Ks = 0.37). 

AcSpl repeats are also conserved among more distantly 
related species (supplementary figs. S3 and S4, Supplementary 
Material online). The repeat length of L hesperus (375 aa) is 
similar to the feather-legged spider, U. diversus (357 aa) but 
almost twice as long as the araneid AcSpl repeats (200-215 
aa). BLASTP produced significant alignments between the 
araneid AcSpl repeats and the first and second halves of 
the L. hesperus and U. diversus AcSpl repeats. We thus split 
the L. hesperus and U. diversus consensus repeats into halves 
and aligned the two parts with the araneid AcSpl repeats 
(supplementary figs. S3 and S4, Supplementary Material 
online). Phenetic (neighbor joining) and phylogenetic (MP) 
clustering with mid-point rooting indicated that the first and 
second parts of the Latrodectus repeats are more similar to 
each other than to the repeats of other species. In contrast, 
the second half of U. diversus is more similar to araneid re- 
peats than to the first part of the U. diversus repeat (supple- 
mentary fig. S3B, Supplementary Material online). 

BLASTP searches of the NCBI nr protein database addition- 
ally recognized significant similarity of the L hesperus AcSpl 
repeat with other spidroin paralogs (£ < 10~ 10 compared 
with £ < 10~ 26 for AcSpl orthologs). These included TuSpl 
(egg-case) spidroins as well as other spidroins with complex 
repeats characterized from araneomorphs and mygalo- 
morphs. However, based on the nonrepetitive terminal re- 
gions, these spidroins do not form a monophyletic group 
(figs. 4 and 5). 
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Fig. 4. Maximum likelihood tree of combined spidroin N- and C-terminal encoding regions. Species abbreviations are defined in figure 1 and 
supplementary table S2, Supplementary Material online. Thickened branches indicate relationships supported by >50% MP bootstrap replicates 
and >0.95 Bayesian posterior probability for both amino acids and nucleotides. Support values for numbered nodes are shown in supplementary table 
S3, Supplementary Material online. Rooting with the mygalomorph spidroin, B.C. fibroinl, resulted in the fewest inferred duplications and losses. Slashed 
lines indicate that the branch to B.C. fibroinl was arbitrarily shortened. Dots indicate L hesperus paralogs. Spidroins in bold had significant similarity to 
the L hesperus AcSpl repeat according to BLASTP. Major clades of spidroins are bracketed. 



We identified only a few conserved sequences flanking 
AcSpl, which included a TATA box and a four base motif 
(CACG) identified upstream of other spidroin genes 
(Motriuk-Smith et al. 2005; Ayoub, Garb, Tinghitella, et al. 
2007). Other conserved sequences were part of transposable 
elements (supplementary methods and results and supple- 
mentary fig. S5, Supplementary Material online). 

Relationship of AcSpl to Other Spidroins 
Consistent with the hypothesis that spidroins coevolved with 
glandular specialization, we found phylogenetic support for a 
sister relationship between AcSpl and the spidroin expressed 
in tubuliform glands, TuSpl (figs. 4 and 5). Parsimony, likeli- 
hood, and Bayesian analyses of combined N- and C-terminal 
amino acids and encoding nucleotides for 29 spidroins (sup- 
plementary table S2, Supplementary Material online) grouped 
Latrodectus AcSpl with a clade of TuSpl s with strong support 
(fig. 4; supplementary fig. S6 and table S3, Supplementary 
Material online). This result contrasts with the weakly sup- 
ported grouping of Flag with TuSpl in the most 



comprehensive phylogenetic analysis of spidroin paralogs 
prior to this one, which did not include AcSpl (Garb et al. 
2010). In our analyses, Flag consistently grouped with Diguetia 
canities MaSp-like sequences, but with poor support (fig. 4 
and supplementary fig. S6, Supplementary Material online). 
Support for grouping AcSpl with TuSpl largely derived from 
N-terminal amino acids and nucleotides (partitioned decay 
index = 5 for N-terminal amino acids or nucleotides, parti- 
tioned decay index = 0 or —3 for C-terminal amino acids 
and nucleotides, respectively; Baker and DeSalle 1997). 

Other relationships among spidroins were consistent 
with patterns described by Garb et al. (2010); the 14 well- 
supported nodes (e.g., >0.95 posterior probability, pp) out of 
a total of 23 nodes from Garb et al. (2010) were also recovered 
with strong support in our analyses. For instance, we found 
strong support for a monophyletic group of TuSpl present in 
orbicularians and a member of the RTA-clade, monophyletic 
araneoid Flag and monophyletic araneoid MaSpl and MaSp2 
(fig. 4; supplementary fig. S6 and table S3, Supplementary 
Material online). 
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Fig. 5. Maximum likelihood tree of expanded dataset of C-terminal region encoding nucleotides. Species abbreviations are defined in figure 1 and 
supplementary table S2, Supplementary Material online. Thickened branches indicate that the node is supported by >50% MP bootstrap replicates and 
>0.95 Bayesian posterior probability for both amino acids and nucleotides. Support values for numbered nodes are shown in supplementary table S4, 
Supplementary Material online. Rooting with the mygalomorph spidroins resulted in the fewest inferred duplications and losses. Dots indicate 
L hesperus paralogs. Spidroins in bold had significant similarity to the L. hesperus AcSpl repeat according to BLASTP. Major clades of spidroins are 
bracketed. 



Expanding the C-terminal data set to include 46 spidroins 
(supplementary table S2, Supplementary Material online) 
recovered a monophyletic group of orbicularian AcSpl 
(>0.95 pp) in all analyses except parsimony searches of 
amino acids (fig. 5; supplementary table S4 and fig. S7, 
Supplementary Material online). Grouping AcSpl with 
TuSpl was additionally recovered in nucleotide ML and 
amino acid Bayesian analyses, albeit with low support 
(<0.95 pp). Well-supported relationships described earlier 
were additionally recovered with high support (>0.95 pp or 
>75% of MP bootstrap replicates) by the expanded data set, 
including monophyletic TuSpl, monophyletic araneoid 
MaSpl and MaSp2, and monophyletic Flag. C-terminal se- 
quences also strongly supported monophyletic mygalomorph 
spidroins (excluding Aliatypus gulosus fibroinl) and mono- 
phyletic araneoid PySp (spidroin in attachment cement 
fibers). In contrast to the poorly supported grouping of Flag 
with D. canities MaSp-like sequences in the smaller (29 spi- 
droin) data set, the larger (46 spidroin) C-terminal dataset 
grouped D. canities MaSp-like sequences with two spidroins 



described from Plectreurys tristis with strong support (>0.95 
pp, >60% of MP bootstrap replicates of nucleotides), except 
that this relationship was not recovered by MP analysis of 
amino acids (fig. 5; supplementary table S4 and fig. S7, 
Supplementary Material online). D. canities and P. tn'stis are 
representatives of the araneomorph clade Haplogynae, which 
is divergent from the orbicularians (fig. 1). 

Species tree — gene tree reconciliation supported rooting 
with the mygalomorph spidroin, B.c. fibroinl, for the phylo- 
genetic trees based on both N- and C-terminal data. This 
rooting resulted in the lowest duplication/loss scores 
(ML: 14 duplications, 51 losses; amino acid MP: 13 duplica- 
tions, 44 or 47 losses depending on MP tree; nucleotide MP: 
13 duplications, 47 losses; amino acid Bayes: 14 duplications, 
43 losses; nucleotide Bayes: 13 duplications, 44 losses). Root 
placement for the expanded C-terminal data set varied 
among trees. For the ML and nucleotide Bayes trees, rooting 
with all the mygalomorph spidroins resulted in the lowest 
duplication/loss scores (ML: 21 duplications and 61 losses; 
nucleotide Bayes: 20 duplications, 49 losses). However, the 
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MP trees based on amino acids possessed eight equal root 
placements (supplementary fig. S7, Supplementary Material 
online, 23 duplications and 74 losses). The nucleotide MP tree 
rooted with a clade of spidroins found in divergent taxa 
(A.g. fibroinl from a mygalomorph, P.t. fibroin4 from a hap- 
logyne araneomomorph, and Flag from an orbicularian ara- 
neomorph) resulted in the lowest score (supplementary fig. 
S7, Supplementary Material online; 25 duplications and 
76 losses). Finally, the Bayesian tree based on amino acids 
rooted with a pairing of a mygalomorph and a haplogyne 
spidroin resulted in fewest duplications (23) and losses (59) 
(supplementary fig. S7, Supplementary Material online). 

Discussion 

Our complete black widow AcSpl contains the longest spi- 
droin coding sequence (18,999 bp) described thus far. Only 
four other complete spidroin coding sequences have been 
previously reported: CySpl (9.1 kb) and CySp2 (9.2 kb; CySp 
is a synonym for TuSp) cDNAs from the wasp spider, 
A. bruennichi (Zhao et al. 2006); and MaSpl (9.4 kb) and 
MaSp2 (1 1.3 kb) genes from the Western black widow 
(Ayoub, Garb, Tinghitella, et al. 2007). Our black widow 
AcSpl surpasses all of these at 19 kb, encoding a 6,332 
amino acid protein with a remarkably large, predicted 
molecular weight of 630 kDa. In contrast, gel electrophoresis 
indicated that black widow AcSpl is a 300 kDa protein 
(Vasanthavada et al. 2007). The discrepancy between pre- 
dicted and observed sizes can be explained by extreme allelic 
length variation of AcSpl in black widows. The near identical 
(>98% pairwise nucleotide identities) tandem repeats found 
in AcSpl could have facilitated unequal crossing over that 
resulted in the rapid loss or gain of repeats. There is precedent 
for huge size variation in a spider silk gene. Chinali etal. (2010) 
documented that MaSpl from 100 individual golden 
orb-weavers (Nephila clavipes) can range from 10 to 17.5 kb. 
The discrepancy in observed and predicted sizes could also be 
attributed to various posttranscriptional or translational 
modifications (e.g., Tran et al. 2011) that have yet to be 
observed with AcSpl. 

Similar to Western black widow MaSpl and MaSp2, AcSpl 
has a peculiar gene structure because it lacks introns. AcSpl 
stands out with an exon size more than double that of MaSpl 
and MaSp2 and exceeding the longest exons known in 
human (17 kb), chimpanzee (1 1.6 kb), mouse (17.1 kb), zebra- 
fish (12 kb), and the roundworm Caenorhabditis elegans 
(15 kb), and approaching the longest known exon in 
Drosophila melanogaster (27.7 kb) (Peng et al. 2009). 
Although limited genomic information is available for other 
species, single exon silk genes may be the rule for widow 
spiders. Partial sequences of Western black widow and 
brown widow MaSpl, MaSpl, MiSp, TuSpl, and AcSpl con- 
tain no evidence of introns (Garb and Hayashi 2005; 
Motriuk-Smith et al. 2005; Ayoub and Hayashi 2008; this 
study, unpublished data). Partial sequences of Nephila 
MaSp2 also lack introns (Motriuk-Smith et al. 2005). In con- 
trast, Argiope MaSp2 and Nephila Flag have multiple introns 
that are nearly identical within a single gene (Hayashi and 
Lewis 2000; Motriuk-Smith et al. 2005). Ayoub, Garb, 



Tinghitella, et al. (2007) noted that single exon genes could 
reflect a process of gene duplication involving retrotransposi- 
tion of mRNA transcripts that would necessarily give rise to 
intronless paralogs. This process of gene duplication could be 
the dominant mode for the spidroin gene family. However, 
retrotransposition often results in pseudogenes since the 
necessary regulatory sequences are not simultaneously dupli- 
cated (Zhang 2003). Instead, lack of introns may be the an- 
cestral condition for the spidroin gene family and Argiope 
MaSp2 and Nephila Flag independently gained introns. 
Characterization of complete spidroin genes from divergent 
spider species would clarify whether spider silk genes have 
experienced multiple gains or losses of introns. 

Homogenization and Conservation of AcSpl Repeats 
Our complete gene sequence demonstrates that black widow 
AcSp7 repeats are as highly homogenized as Argiope and 
other araneid AcSpl repeats. The near identity of intragenic 
AcSpl repeats is unusual even in comparison with other 
spidroins (e.g., fig. 3). Purifying selection could maintain iden- 
tity among repeats. Indeed, the rate of silent substitutions 
exceeds that of amino acid replacements in black widow 
AcSpl (Ka/Ks = 0.22), but similar low values of Ka/Ks were 
found for black widow TuSpl (fig. 3) and MaSpl intragenic 
repeats (Ayoub, Garb, Tinghitella, et al. 2007). Thus, purifying 
selection on amino acid sequence or stabilizing selection to 
maintain similar amino acid repeats within a single polypep- 
tide alone cannot explain near identity among repeats. Gene 
conversion and unequal crossing-over, facilitated by iterated 
repeats, have been frequently cited as processes that lead 
to intragenic concerted evolution among spidroin repeats 
(e.g., Beckwitt et al. 1998; Gatesy et al. 2001; Hayashi et al. 
2004; Garb and Hayashi 2005; Ayoub, Garb, Tinghitella, et al. 
2007). Here, we discuss how selective constraints, mutation, 
and concerted evolution could act differently on AcSpl com- 
pared with other spidroin-encoding genes. 

AcSpl intragenic repeats could be more homogenized 
than the repeats of paralogous spidroins because they experi- 
ence lower mutations rates. In spidroins with simple repeats, 
such as MaSpl and MaSp2, the tandem repetition of codons 
for amino acid sequence motifs such as contiguous stretches 
of alanines, can lead to slip-strand mispairing that results in a 
higher mutation rate within the repeats (to the extent that 
repeats cannot be reliably aligned between species as distantly 
related as L hesperus and L geometricus) than in terminal 
regions of the genes (Ayoub, Garb, Tinghitella, et al. 2007; 
similar pattern in Flag exons vs. introns, Hayashi and Lewis 
2000). Latrodectus AcSpl and TuSpl repeats have complex 
amino acid sequences that do not have a high proportion 
of these simple motifs, and thus are expected to have less 
localized slip-strand mispairing. In fact, the synonymous sub- 
stitution rate between L hesperus and L geometricus TuSpl 
repeats is similar to the adjacent C-terminal encoding region 
(fig. 3), suggesting similar mutation rates and/or selective 
constraints across the gene. In contrast, AcSpl repeats experi- 
enced far fewer interspecific synonymous substitutions than 
the adjacent N- or C-termini (fig. 3). 
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A dramatically lower mutation rate in the repetitive versus 
the terminal-region encoding sequences of AcSpl seems 
unlikely considering that the adjacent gene regions are in 
the same genomic location, have the same base composition 
(~55% AT), and that both lack the strings of GC-rich 
sequence associated with higher mutation rates (e.g., Sved 
and Bird 1990). Codon bias in AcSpl could constrain syn- 
onymous substitutions, but the tendency toward A or T in 
the third position of codons is similarly high in black widow 
MaSpl and MaSp2 (Ayoub, Garb, Tinghitella, et al. 2007) and 
TuSpl (e.g., 100% of Gly and 75% of Ala codons end in A or T; 
Garb and Hayashi 2005). Instead, stabilizing selection on 
AcSpl repeats for specific mRNA secondary structures that 
increase mRNA stability, control translation, or prevent 
degradation could constrain synonymous substitutions (e.g., 
Katz and Burge 2003; Chamary and Hurst 2005; Meyer and 
Miklos 2005). 

Stochasticity in the process of intragenic concerted evolu- 
tion probably also contributes to the homogeneity of intra- 
genic AcSpl repeats and could explain different patterns of 
conservation of AcSpl repeats among taxonomic groups. 
In the absence of selective constraints, concerted evolution 
should increase the apparent rate of interspecific divergence 
because a single mutation can rapidly proliferate to each 
of the repeats. Consistent with this hypothesis, araneid 
(A. amoena, A. trifasciata, and Aran, ventricosus) AcSpl re- 
peats are less conserved between species than are the adja- 
cent C-terminal regions, a pattern also seen in MaSpl, MaSp2, 
and Flag of multiple species comparisons (e.g., Beckwitt et al. 
1998, Hayashi and Lewis 2000, Ayoub and Hayashi 2008). 
However, it is also possible for new mutations to be replaced 
by the ancestral repeat sequence during concerted evolution. 
By chance, only the latter may have happened in Latrodectus. 
Denser taxonomic sampling of AcSpl orthologs both 
above and below the species level is needed to evaluate the 
relative roles of selective constraints on synonymous sites, 
mutation rates, and concerted evolution on the unusual pat- 
terns of conservation in AcSpl and its extreme intragenic 
homogeneity. 

History of Silk Gene Duplications 

Our phylogenetic results are consistent with the hypothesis 
that spidroins co-evolved with gland specialization, or the 
glandular affiliation hypothesis proposed by Hayashi and 
Lewis (1998). The first prediction of this hypothesis is that 
spidroins expressed in the same type of differentiated silk 
glands should be orthologous. Within Entelegynae (fig. 1), 
monophyly of TuSp, Flag, and PySp (fig. 5) and their expres- 
sion in tubuliform glands, flagelliform glands, and pyriform 
glands, respectively, lend support to orthology of spidroins 
with gland-specific expression. C-terminal domains also sup- 
ported the monophyly of AcSpl (fig. 5). Furthermore, rela- 
tionships among AcSpl C-termini reflect putative species 
relationships (fig. 1). The deinopoid, U. diversus, is sister to a 
clade of araneoids including monophyletic Latrodectus and 
monophyletic araneid sequences (fig. 5). In the report of a 
partial AcSpl cDNA from L. hesperus, Vasanthavada et al. 



(2007) suggested that their "AcSp7-like" might not be ortho- 
logous to araneid AcSpl and predicted the presence of a 
second copy of AcSpl. The "AcSp7-like" cDNA, however, 
was nearly identical to our completely sequenced AcSpl 
gene and partial AcSpl sequences from four additional 
genomic clones and four individual spiders in both coding 
and non-coding regions. Inspection of chromatograms 
from directly sequenced PCR-amplifications of AcSpl from 
individual spiders revealed that double peaks, which can be 
interpreted as variation within a single genome that corres- 
ponds to either two alleles at a single locus (heterozygosity) 
or more than one locus, were never found at the same 
site among all four individuals. Thus, we failed to find 
evidence for a second copy of AcSpl in the L hesperus gen- 
ome. Instead, each of the genomic clones and the previously 
described cDNA likely represent allelic variants of the same 
locus. 

The glandular affiliation hypothesis also predicts that 
relationships among spidroin paralogs mirror relationships 
among gland types. Schultz (1987) suggested that 
aciniform-shaped glands are the ancestral type for both 
spider infra-orders, Mygalomorphae and Araneomorphae 
(fig. 1). Each silk gland is connected to its own spigot that 
is visible on the external anatomy of a spider. Spigots vary in 
size, shape, and sculpturing according to the type of silk gland 
to which they are connected. Based on the broad taxonomic 
distribution of morphologically distinguishable aciniform, 
pyriform, and major ampullate-shaped glands or their diag- 
nostic spigots in all examined members of Araneomorphae, 
their common ancestor is thought to have possessed acini- 
form, pyriform, and major ampullate glands (Kovoor 1987; 
Platnick et al. 1991; Griswold et al. 2005). Minor ampullate 
glands are also widely distributed among araneomorphs and 
were likely present in the common ancestor of Haplogynae 
and Entelegynae (fig. 1, Griswold et al. 2005). Tubuliform 
glands, which are distinguished by their presence in adult 
females but not adult males, are present in most representa- 
tives of Entelegynae that have been examined but are 
only found in two families of Haplogynae (Kovoor 1987). 
Tubuliform glands of Entelegynae are thus considered to 
have an independent derivation from those in Haplogynae 
(Platnick et al. 1991). Intriguingly, tubuliform glands in ente- 
legynes are virtually indistinguishable from aciniform glands 
during early development (Richter 1970; Shultz 1987). 
Furthermore, the number of aciniform glands in Peucetia 
and Oxyopes (RTA-clade, Oxyopidae) in adult males is equal 
to the number of aciniform plus tubuliform glands in adult 
females (Kovoor and Munoz-Cuevas 1998), suggesting 
that tubuliform glands are specialized aciniform glands 
(Shultz 1987). 

Consistent with the glandular affiliation hypothesis, we 
found a sister relationship between AcSpl and TuSpl 
(fig. 4). The gene duplication event that gave rise to the 
AcSpl and TuSpl paralogs is at least as old as the diver- 
gence of orbicularian and RTA-clade spiders (fig. 1), 
~240Ma (Ayoub and Hayashi 2009). This divergence 
could have happened in the common ancestor of all 
Entelegynae families (Griswold et al. 2005), or more recently 
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if Orbiculariae and the RTA-clade are united to the exclu- 
sion of other Entelegynae families (Griswold et al. 1999; 
Coddington 2005). In either case, we predict that 
Haplogynae should possess spidroins (expressed in acini- 
form glands) that are orthologous to the Entelegynae 
AcSpl plus TuSpl clade. 

Gene duplication events leading to other functionally dis- 
tinct paralogs are not well supported by our phylogenetic 
results, but a basal position of the AcSpl plus TuSpl clade 
among araneomorph spidroins (fig. 4) is consistent with 
aciniform glands representing a gland type that differentiated 
early in the history of spiders. Additionally, the grouping 
of araneoid PySp with spidroins from Haplogynae in multiple 
analyses is consistent with very ancient origins of PySp, 
perhaps concomitant with the origin of pyriform glands in 
the common ancestor of Araneomorphae (fig. 5, supplemen- 
tary fig. S7, Supplementary Material online). More intensive 
taxonomic sampling of both N- and C-terminal spidroin 
domains should clarify the history of these ancient duplica- 
tion events. 

Evolution of Repeat Length and Complexity 
Spidroin repeat units are extremely variable in sequence 
among paralogs but can be grouped into three basic cate- 
gories (referred to here as 1, 2, and 3). (Category 1) The first 
category includes spidroins with long (e.g., >150 aa) repeat 
units that have complex amino acid compositions and little 
internal repetitions. AcSpl, TuSpl, mygalomorph spidroins, 
and some Haplogynae spidroins (e.g., P.t. fibroins 3 and 4) fall 
into this category (Gatesy et al. 2001; Hayashi et al. 2004; Garb 
and Hayashi 2005; Garb et al. 2007; Starrett et al. 2012). 
(Category 2) In contrast, MaSpl and MaSp2 of Entelegynae 
have short ensemble repeats that are almost entirely com- 
posed of internal repetitions of simple amino acid motifs (e.g., 
Gatesy et al. 2001; Ayoub, Garb, Tinghitella, et al. 2007). 
(Category 3) Other spidroins do not fit neatly into either of 
these two categories but combine elements of both. For in- 
stance, the Flag ensemble repeat is almost entirely composed 
of iterations of GPGXX, but the region that is repeated is very 
long (e.g., >200 aa) and punctuated by short spacers (e.g., 
27 aa) that have a complex amino acid sequence (Hayashi and 
Lewis 1998). PySp, MiSp and some Haplogynae spidroins (e.g., 
P.t. fibroinl) also fit into category 3 (Gatesy et al. 2001; 
Blasingame et al. 2009). The broad taxonomic distribution 
of the first category (long complex repeats) and our rooted 
spidroin trees suggest that a long repeat unit with complex 
amino acid composition is the ancestral spidroin condition 
(figs. 4 and 5, Starrett et al. 2012). Within Orbiculariae, the 
group of spiders with the most diverse glands and spidroin 
paralogs, AcSpl and TuSpl have retained these features. 
Retention of these characters may be related to the ecological 
functions of AcSpl and TuSpl, which include prey-wrapping 
and protecting eggs. The functions of spidroins containing 
simple amino acid motifs (e.g., MaSpl, MaSp2, Flag) include 
aspects of aerial web building that have demanding tensile 
requirements, which may have selected for multiple 



independent shortening and simplification events of spidroin 
repeats (Garb et al. 2010). 

Among spidroins with category 1 (long) repeats, 
L hesperus and U. diversus AcSpl stand out as exceptionally 
long (375 and 357 aa, respectively). Araneid AcSpl repeats are 
200-215 aa, TuSpl repeats are 180-294 aa, and mygalo- 
morph spidroin repeats are 169-181 aa except for fibroins 
1 of Euagrus chisoseus and Megahexura fulva, which are 342 
and 365 aa, respectively (Garb et al. 2007; Starrett et al. 2012). 
Intriguingly, most of the spidroins with repeats longer than 
340 aa can be divided into two approximately equal length 
subrepeats (e.g., supplementary fig. S3, Supplementary 
Material online), suggesting that ancestral spidroin repeat 
length is slightly less than 200 aa. Shifts in the periodicity of 
intragenic concerted evolution events from ~600 to 
~1,200 bp could have led to the doubling in size of each of 
these spidroin repeats. In the case of AcSpl, this shift could 
have taken place in an orbicularian ancestor or earlier, with a 
reversal to the smaller repeat unit in araneids. Alternatively, 
Latrodectus and U. diversus may have independently evolved 
longer periodicity while araneids retained the ancestral 
condition. 

Conclusions 

Spider aciniform silk is unique in terms of function (prey 
wrapping), mechanical properties (one of the toughest), 
and molecular structure. Our complete black widow AcSpl 
gene is the longest coding sequence described for a spider 
gene and includes 16 iterations of repeating units that are 
near identical at the amino acid and nucleotide levels. Each 
repeat encodes a complex amino acid sequence that is con- 
served across AcSpl repeats of other species and, at a lower 
level of similarity, with other spidroin types, such as TuSpl. 
The homogeneity and complexity of repeats likely contribute 
to the mechanical properties of aciniform silk and its proper 
function during prey wrapping. However, stabilizing selection 
among repeats or purifying selection on amino acid sequence 
cannot explain the extreme homogenization of AcSpl 
repeats, suggesting that other forms of selection and/or con- 
certed evolution contribute to its molecular structure. AcSpl 
possesses many features that are presumed ancestral in spi- 
droins, such as a complex amino acid repeat sequence that 
lacks extensive subrepeats and a long repeat length. Our 
phylogenetic results are consistent with the coevolution of 
spidroin gene duplication events and gland specialization. 
Specifically, tubuliform glands and aciniform glands are 
likely derived from an aciniform-like ancestral gland. The 
sister relationship between TuSpl and AcSpl suggests that 
gene duplication and expression divergence of TuSpl and 
AcSpl occurred concomitant with gland differentiation. 
Finally, our complete AcSpl can be used as a template for 
synthesis of recombinant aciniform silk via transgenic tech- 
nology. As we increase our understanding of the role of 
non-coding flanking sequences in the regulation of spidroins, 
we will be able to capitalize on these regions of genomic 
sequences for increasing the artificial production of spider 
silks. 
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Materials and Methods 

Sequencing 

We screened a L hesperus genomic library with PCR for clones 
containing AcSpl (see Ayoub, Garb, Tinghitella, et al. 2007 for 
library construction and screening protocols). Primers used in 
screening were designed from the repetitive region of a 
partial-length cDNA (EU025854; primers listed in supplemen- 
tary table S5, Supplementary Material online). An ~48kb 
AcSpl containing clone was shotgun sequenced and 
assembled to 8X coverage by Qiagen (Hilden, Germany), 
resulting in three contiguous sequences (contigs). One 
contig contained the 5' portion of AcSpl, a second contig 
included the 3' portion, and the third contig consisted of the 
cloning vector and some insert sequence. All three contigs 
contained AT microsatellites at one end; the exact number of 
base pairs in the AT microsatellite regions was difficult to 
confirm. Comparisons of predicted restriction enzyme recog- 
nition sites of the contigs and restriction enzyme digests of 
the clone indicated a ~500bp gap within the noncoding 
region of the insert and a ~500bp gap within AcSpl. We 
used primer walking to close the gap in the noncoding region 
but this approach was not possible for the gap within AcSpl, 
due to the extreme similarity among repeats (see Results). 
Instead, we digested the clone with EcoRV (NEB) and ligated 
three fragments (2.1, 7, and 7.8 kb) that contained only AcSpl 
coding sequences into pZErO™-2 plasmids and then 
transformed TOP10 electrocompetent Escherichia coli cells 
(Invitrogen). We sequenced the ends of each of these 
subclones and performed a de novo assembly of sequences 
generated from primer walking, EcoRV subcloning, and shot- 
gun sequencing using PREDPHRAP (Ewing and Green 1998; 
Ewing et al. 1998). We then manually edited the assembly 
using CONSED v.19 (Gordon et al. 1998, 2001; Gordon 2004) 
so that there was agreement between the predicted restric- 
tion enzyme digest and the observed digest pattern of the 
clone. 

We amplified AcSpl from L geometricus genomic DNA 
using primers designed from the L hesperus AcSpl genomic 
sequence (primers in supplementary table S5, Supplementary 
Material online). We sequenced the following coding regions: 
779 bp of N-terminus and adjacent repeat, 850 bp of 
C-terminus and adjacent repeat, and three fragments 
of AcSpl repeats. We assembled the three fragments from 
the repetitive region (those not adjacent to N- or C-termini) 
into a single 1,200 bp fragment using Sequencher v.4.9 
(GeneCodes) and considered this sequence to represent the 
consensus AcSpl repeat in L. geometricus. Double peaks in 
these chromatograms were scored as polymorphic positions 
using the IUPAC ambiguity code. Polymorphic positions 
could represent allelic variation (heterozygosity) or intragenic 
repeat variation. 

We assessed L hesperus allelic variation by directly sequen- 
cing four additional AcSpl containing genomic clones with 
C-terminal primers (the genomic library was constructed 
from multiple individuals). We also amplified C-terminal 
and adjacent repetitive encoding sequence or adjacent down- 
stream noncoding regions from four individual L hesperus 



spiders (three collected in Riverside, CA, and one collected 
in Tucson, AZ). 

Homogenization and Conservation of AcSpl 
We identified all ORFs greater than 300 bp in the L hesperus 
genomic clone using ORF Finder (NCBI). We identified AcSpl 
using conceptual translations and BLASTX (Altschul et al. 
1990; universal genetic code) comparisons with other 
spidroins. We considered the first in frame Met to begin 
AcSpl . Amino acid content and codon usage were deter- 
mined with CODONW (http://codonw.sourceforge.net/, last 
accessed November 20, 2012). Hydrophobicity plots were 
constructed with MacVector 12.5 (MacVector, Inc.) using a 
window size of eight amino acids. For comparison, hydropho- 
bicity was also plotted for L. hesperus MaSpl (EF595246). 

L. hesperus AcSpl amino acid repeats were identified by 
eye, separated, and manually aligned. This alignment was used 
to identify and align nucleotide repeats. We checked every 
polymorphic position among AcSpl repeats in the assembly 
to ensure that single chromatograms contained all poly- 
morphic positions within a single repeat. This was done to 
confirm that each of the repeats reported (see Results) existed 
in the genomic clone. 

We searched the NCBI nr protein database using BLASTP 
with a L hesperus AcSpl repeat to determine whether the 
repetitive region of AcSpl was significantly conserved among 
species (E value < 10~ 5 ). We then manually aligned 
L hesperus AcSpl repeats to our L. geometricus sequence 
and those identified by BLAST, including sequences from 
three species in the family Araneidae: Aran, ventricosus 
(ADM35668), A. trifasciata (AAR83925), and A. amoena 
(ADM35669); and one in Uloboridae: U. diversus 
(ABD61598) using BLASTP alignments as a preliminary guide. 

Pairwise number of synonymous substitutions per syn- 
onymous sites (Ks) and nonsynonymous substitutions per 
nonsynonymous sites (Ka) between repeats were calculated 
using DNASP v.5 (Librado and Rozas 2009). We compared 
substitution patterns within the repeats with adjacent 
terminal-encoding regions by calculating interspecific Ks 
and Ka between L hesperus and L geometricus AcSpl repeats, 
N-termini, and C-termini. For comparison with paralogous 
spidroins, we similarly calculated intragenic and interspecific 
Ks and Ka values for a partial length cDNA of an L hesperus 
TuSpl (AY953070), and a complete genomic copy of an L 
hesperus MaSpl (EF595246). Intragenic comparisons of 
MaSpl focused on the 20 "aggregate repeats" identified in 
Ayoub, Garb, Tinghitella, et al. (2007). 

We also searched for conserved sequences in the flanking 
regions of AcSpl using a variety of methods (see supplemen- 
tary methods and results, Supplementary Material online). 
These flanking regions should contain elements involved in 
gland-specific regulation of AcSpl. 

Relationship of AcSpl to Other Spidroins 
We added our L hesperus and L geometricus AcSpl N- and 
C-terminal coding sequences to an alignment of 26 spidroin 
termini generated by Garb et al. (2010). We also added N- and 
C-terminal coding sequence for "fibroin 1a" from Deinopis 
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spinosa (Deinopidae, see supplementary table S2, 
Supplementary Material online). N-terminal sequences were 
not available for many spidroins and thus we added 17 more 
spidroin sequences to the C-terminal alignment to more 
comprehensively represent spidroin gene family diversity in 
our analyses (supplementary table S2, Supplementary 
Material online). The expanded data sets were aligned with 
MUSCLE (Edgar 2004) implemented in SEAVIEW v.4.1 
(Galtier et al. 1996) and manually edited. Amino acid align- 
ments were used to guide nucleotide alignments in SE-AL 
v.2.0a11 (http://tree.bio.ed.ac.uk/software/seal/, last accessed 
November 20, 2012). 

We conducted heuristic searches for maximum parsimony 
(MP) and maximum likelihood (ML) trees based on amino 
acid and nucleotide alignments in PAUP* v4.0b10 (Swofford 
2002) using tree bisection reconnection branch swapping and 
1,000 (MP) or 10 (ML) replicates of random stepwise addition 
of taxa. Support for clades recovered in MP analyses was 
evaluated with 1,000 bootstrap pseudoreplicates and 10 
random addition sequences per pseudoreplicate. Support 
for clades was further evaluated by calculating decay indices 
(Bremer 1988; Baker and DeSalle 1997) with the assistance of 
TREEROT v.3 (Sorenson and Franzosa 2007). Bayesian ana- 
lyses were carried out with MRBAYES v.3.1.2 (Huelsenbeck 
and Ronquist 2001; Ronquist and Huelsenbeck 2003). 
Optimal models of evolution were determined for nucleotide 
sequences with JMODELTEST vO.1.1 (Posada 2008) and for 
protein sequences with PROTTEST v2.4 (Abascal et al. 2005) 
for N- and C-termini separately. Combined analysis of nucleo- 
tides employed a model partitioned by N- and C-termini. 
Combined analysis of amino acids employed a mixed 
model, which allowed estimation of the optimal model of 
protein evolution during the Bayesian analysis. Default 
priors and Metropolis coupled, Markov-chain, Monte Carlo 
sampling procedures were executed for two independent 
runs, sampled every 100th generation, carried out simultan- 
eously. Convergence was assessed every 1,000th generation 
and the posterior distribution was considered adequately 
sampled when the standard deviation of split frequencies of 
these two runs dropped below 0.01 (1-5 million generations 
depending on data set). 

We determined the root of spidroin trees by gene tree- 
species tree reconciliation, which minimizes the number of 
inferred gene duplications and losses given a species tree, 
using NOTUNG v.2.6 (Durand et al. 2006; Vernot et al. 
2008) and default cost parameters. Our species tree (fig. 1) 
is based on a number of previously developed phylogenetic 
hypotheses for spiders. Family level relationships were based 
on Coddington (2005) for Araneomorphae and Ayoub, Garb, 
Hedin, et al. (2007) for Mygalomorphae. Lower level relation- 
ships followed those described by Kuntner et al. (2008) for 
Nephila, Scharff and Coddington (1997) for Araneidae, and 
Elices et al. (2009) for Argiope. 

Supplementary Material 

Supplementary methods and results, figures S1 -S7, and tables 
S1-S5 are available at Molecular Biology and Evolution online 
(http://www.mbe.oxfordjournals.org/). 
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